Identification and comparison of m6A modifications in glioblastoma non-coding RNAs with MeRIP-seq and Nanopore dRNA-seq

ABSTRACT The most prominent RNA modification – N6-methyladenosine (m6A) – affects gene regulation and cancer progression. The extent and effect of m6A on long non-coding RNAs (lncRNAs) is, however, still not clear. The most established method for m6A detection is methylated RNA immunoprecipitation and sequencing (MeRIP-seq). However, Oxford Nanopore Technologies recently developed direct RNA-seq (dRNA-seq) method, allowing m6A identification at higher resolution and in its native form. We performed whole transcriptome sequencing of the glioblastoma cell line U87-MG with both MeRIP-seq and dRNA-seq. For MeRIP-seq, m6A peaks were identified using nf-core/chipseq, and for dRNA-seq – EpiNano pipeline. MeRIP-seq analysis revealed 5086 lncRNAs transcripts, while dRNA-seq identified 336 lncRNAs transcripts from which 556 and 198 were found to be m6A modified, respectively. While 24 lncRNAs with m6A overlapped between two methods. Gliovis database analysis revealed that the expression of the major part of identified overlapping lncRNAs was associated with glioma grade or patient survival prognosis. We found that the frequency of m6A occurrence in lncRNAs varied more than 9-fold throughout the provided list of 24 modified lncRNAs. The highest m6A frequency was detected in MIR1915HG, THAP9-AS1, MALAT1, NORAD1, and NEAT1 (49–88nt), while MIR99AHG, SNHG3, LOXL1-AS1, ILF3-DT showed the lowest m6A frequency (445–261nt). Taken together, (1) we provide a high accuracy list of 24 m6A modified lncRNAs of U87-MG cells; (2) we conclude that MeRIP-seq is more suitable for an initial m6A screening study, due to its higher lncRNA coverage, whereas dRNA-seq is most useful when more in-depth analysis of m6A quantity and precise location is of interest. Abbreviations: (dRNA-seq) direct RNA-seq, (GBM) glioblastoma, (LGG) low-grade glioma, (lncRNAs) long non-coding RNAs, (m6A) N6-methyladenosine, (MeRIP-seq) methylated RNA immunoprecipitation and sequencing, (ncRNA) non-coding RNA, (ONT) Oxford Nanopore Technologi; Lietuvos Mokslo Taryba


Background
First chemical RNA modification was discovered by Waldo E. Cohn and Elliot Volkin in 1951 [1]. Almost 70 years later, this knowledge expanded to more than 150 identified RNA modifications [2]. Interestingly, the field of epi-transcriptomics recently regained its interest due to characterization of the regulatory machinery of N6-methyladenosine (m6A) modification [3,4] and a breakthrough of its efficient recognition methods [5,6]. N6-methyladenosine is a reversibly methylated adenosine, which is co-/posttranscriptionally installed by 'writers' [7], interpreted by 'readers' [8], and removed by 'erasers' [4,9]. Thorough mapping and identification of specific RNA molecules containing m6A revealed modification importance and role in gene expression [10], development [11], and disease progression [12]. Recent development of novel approaches for m6A identification revealed limitations of the firstgeneration methods and highlighted directions for further improvements, primarily related to increased registration sensitivity and bioinformatics [13]. Furthermore, majority of the studies solely focuses on the methylation of coding RNAs, while ignoring hugely important and m6A modified non-coding RNA (ncRNA) molecules [14].
Currently, m6A is primarily recognized by MeRIP-seq method based on anti-m6A antibodies [5,6]. These antibodies specifically bind to methylated adenosines and enable identification of the approximate location of modifications. More recently, identification of RNA modifications was introduced by Oxford Nanopore Technologies (ONT) direct RNA-seq (dRNA-seq) method [13]. dRNA-seq enables precise detection of RNA modifications including m6A in its native form, by pulling molecules through the membrane containing embedded nanopore particles. ONT sequencing device registers nucleotides by measuring disruptions in the electric current intensity as an RNA molecule passes through the pore. Even though dRNA-seq is an established method which can register RNA modifications at a single nucleotide resolution, its efficiency and accuracy require further improvements [13].
The aim of the study was (1) to define a list of m6A locations in lncRNAs of U87-MG cell line; and (2) to compare MeRIP-seq and dRNA-seq methods for m6A detection in lncRNAs. We hypothesized that dRNA-seq would result in a more precise profile of lncRNA epi-transcriptome in U87-MG cells, compared to MeRIP-seq. And that molecules identified by both methods could be used in future glioma treatment or screening studies.

Experimental design of sequencing experiments
For both platforms, two replicates (R1 and R2) of U87-MG cell lines were used. In total 6 replicates were obtained: two for MeRIP-seq [expression] (input), two for MeRIP-seq [m6A] (m6A) and two for dRNA-seq (as it does not require input dataset, expression and m6A information is obtained from the same sample. However separate pipelines were applied for the evaluation of lncRNA expression and m6A modification (following indicated as [expression] and [m6A], respectively)). Experimental design and key differences in library preparation are depicted in (Figure 1). MeRIP-seq requires cDNA synthesis and PCR amplification, which results in sequencing of PCR amplicon. On the other hand, dRNA-seq employs 1st strand cDNA synthesis to ensure molecule stability, resulting in sequencing of native RNA. More detailed comparison of the two methods is presented in Table S1.

Methylated RNA immunoprecipitation (MeRIP)
Dominissini and Meyer (2012) protocols were followed [5,6]. 18 mg of polyA enriched RNA was mixed with fragmentation buffer (Sigma Aldrich, cat. No. 17-10,499), heated for 3 min at 94°C. RNA The coloured blocks indicate poly-adenylated RNA molecules that are being analysed. Red asterisks represent m6A modifications in RNA molecules. A native RNA molecule that has retained RNA modifications is analysed by dRNA-seq, while MeRIP-seq is used to analyse short cDNA products and data from two libraries (before and after immunoprecipitation) are compared.
was placed on ice, mixed with 2 mL 0.5 M EDTA, and precipitated in ethanol overnight. 10

MeRIP-seq and data analysis
MeRIP-seq RNA quality and quantity were assessed by Agilent's Bioanalyzer (Agilent Technologies, cat. No. 5067-1513) and Qubit RNA HS kit (Invitrogen, cat. No. Q32852). MeRIP-seq libraries were prepared with Novogene's RIP-seq protocol, which (i) synthesized double-stranded cDNA, (ii) ligated sequencing adapters and polyA to 5' and 3' respectively, and (iii) amplified the libraries with PCR. After library purification and quantification, a single-end 50 bp sequencing was performed on Illumina platform (minimum of 25 M reads/sample). Gene/transcript quantification and m6A calling were computed running nf-core/rnaseq and nf-core/chipseq pipelines, respectively [24]. Non-stranded reads were aligned to Ensembl human reference genome (GRCh38.p13.105) using hisat2 [25]. Further analysis was carried out with default parameters.

Nanopore direct RNA sequencing
PolyA enriched RNA of U87-MG was used for dRNA-seq. Three extracts of U87-MG polyA RNA were mixed to generate homogeneous sample for sequencing. 800-1000 ng of RNA was used to generate nanopore libraries following the manufacturers protocol (Oxford Nanopore Technologies, cat. No. SQK-RNA002, ver.: 6.0.7). Purified libraries were quantified with Qubit dsDNA HS kit (Invitrogen, cat. No. Q32851); At least 80-100 ng of prepared libraries were used for sequencing. A MinION device (Oxford Nanopore Technologies, cat. No. Mk1B) and flow cells (Oxford Nanopore Technologies, cat. No. FLO-MIN106D) were used for sequencing. Manufacturer's instructions were followed for all procedures. All experiments were carried out with a flow cells containing 1200-1500 live pores. Sequencing ran for 48-54 h.

Bioinformatic analysis in glioma samples
Transcript levels of lncRNA genes in patients' samples of Glioblastoma (GBM), Low-grade glioma (LGG) and normal brain were obtained from GlioVis data portal [32]. LncRNA expression association with GBM grades and overall survival were analysed. lncRNA expression differences between pairs p-values were determined using Tukey's Honest Significant Difference. Brain tumor datasets visualisation tool GlioVis was used for Kaplan-Meier analysis of overall survival data of Glioma patients based on the lncRNA expression levels. The log-rank test was used to calculate the p-values for the significance of the difference between the high and low expression of lncRNA.

Quality control information of overall RNAs and only lncRNAs
MeRIP-seq and dRNA-seq are powerful methods that provide a quantitative and informative view of the transcriptome with the number of sequenced molecules being a key element in transcriptome analysis. dRNA-seq resulted in 0.7 and 0.9 M reads for replicate 1 and 2. Median quality of dRNA-seq was ~11Q, meaning that the base call accuracy was ~90%. dRNA-seq replicates resulted in similar quality of read distribution with slightly higher quality reads obtained in the first replicate ( Figure 2). Interestingly the first replicate resulted in longer N50 read when compared to the second replicate (Table 1). MeRIP-seq produced 50 bp reads, which did not differ between replicates due to sequencing design. First and second m6A MeRIP-seq replicates had 39.8 and 38 M reads of high quality >35.9Q, indicating that the probability of an incorrect base call was less than 1 out of 1000. The read quality measurements of both approaches are usual and expected for both platforms.
To investigate potential MeRIP-seq or dRNAseq suitability for m6A identification in lncRNAs, we decided to evaluate quality control (QC) parameters specifically to lncRNAs (Table 2 and Figure 3). To this end we used custom pipeline for lncRNAs alignments and QC. Median quality of lncRNA reads obtained by dRNA-seq was ~11Q, and MeRIP-seq lncRNA is ~36Q ( Table 2). This indicates that read quality is not   affected by molecules biotype in any of the sequencing methods. The N50 read and read length distribution was similar in both lncRNA-alone and overall RNA analyses.

Detection of lncRNA genes in U87-MG cell line using MeRIP-seq and Nanopore dRNA-seq platforms
Both biological replicates (R1 and R2) were used for MeRIP-seq and dRNA-seq analysis; raw sequencing data was processed through the same pipelines, as described in the methods section (for MeRIP-seq and dRNA-seq accordingly). The final datasets, representing MeRIP-seq and dRNA-seq, were obtained by combining the processed sequencing data of both replicates and filtering out non-overlapping genes/transcripts ( Figure 4).
In total MeRIP-seq [expression] identified 23,533 and 27,883 genes from each replicate, of which 4617 and 6262 were assigned to lncRNAs. Between replicates 4070 lncRNAs were conserved ( Figure 4c). Meanwhile, the dRNA-seq [expression] analysis resulted in 11,906 and 12,245 genes from each replicate, of which 312 and 334 were assigned to lncRNAs; 284 genes overlapped between replicates (Figure 4e). A summary of MeRIP-seq and dRNA-seq data of the two U87-MG replicates (at gene level) is provided in (Figure 4d, f). MeRIP-seq [expression] identified ~14 times more lncRNA genes than dRNA-seq, which might be associated with higher reads obtained from MeRIP-seq, and PCR amplification step involved in its library preparation,  which enables capture of lowly expressed lncRNA molecules.

Detection of m6A modifications on lncRNAs in the U87-MG cell line using MeRIP-seq and nanopore dRNA-seq technologies
The overall experimental design resulted in 4 different datasets: 1.  (Table 3). In conclusion, MeRIP-seq identified ~2.8 times more m6A modified lncRNAs. We decided to combine all 4 datasets to obtain a list of m6A modified lncRNA transcripts which expression and m6A modification status have been confirmed by both sequencing platforms (Table 3 and 4). dRNA-seq and MeRIP-seq methods identified 24 overlapping lncRNA molecules with m6A modifications in the U87-MG cell line (Table 4).
We also investigated m6A modification frequency in identified lncRNAs using precise location of m6As from dRNA-seq. We decided to use an average read length rather than actual length of the molecule, because sequencing reads do not always cover the whole length of the molecule. We investigated 24 lncRNA molecules and found a tendency for different m6A modification frequency.
Modification frequency is interpreted as an average m6A occurrence interval in each molecule.  Table 4.

dRNA-seq read coverage analysis of long and short lncRNAs
While MeRIP-seq data provided information about molecules methylation status, it did not produce desired resolution in terms of m6A peak calling; often peak start and end would overlap and preclude identification of the precise peak count. An example of MALAT1 lncRNA MeRIPseq m6A peaks are presented (Figure 5a). Meanwhile, dRNA-seq captured m6A modifications are mapped with more detail and precision (Figure 5b). While MeRIP-seq identified high m6A enrichment in 5' region, with lower-intensity m6A 198* /336# The asterix '*' indicates number of lncRNAs having m6A methylated peaks from MeRIP-seq or methylated RRACH motifs from dRNA-seq. The hashtag '#' indicates overall number of lncRNAs for which RRACH motifs were detected and analysed applying EpiNano pipeline.
signal towards the end of the transcript (Figure 5a), dRNA-seq showed high levels of m6A modifications (14 modified RRACH motifs) in the 3' region ( Figure 5b). Lack of full-length coverage for MALAT1 lncRNA by dRNA-seq precluded identification of m6A modification status of 5' region. Despite that dRNA-seq only resulted in data on the 3' region of MALAT1, and its resolution greatly outperformed the information that MeRIP-seq provided for the full-length gene. In contrast to the long lncRNA, MALAT1, analysis of short lncRNAs (CYTOR, IRF1-AS1 and ZNF295-AS1), with lengths close to the median read length of dRNA-seq (~900 nt), showed dRNA-seq coverage of the entire molecule (Figure S1 and S2). lncRNAs lengths are listed in (Table S2). Further analysis of long lncRNA molecules like MALAT1 and NEAT1 revealed identical pattern of m6A identification in 3' but not in 5' prime region due to dRNA-seq difficulties to obtain enough coverage for 5' region of long (> 1000 nt) RNA molecules. This difficulty is due to the fragility of RNA molecules, during RNA isolation and sequencing library preparation. Interestingly, two high-coverage dRNA-seq locations were observed in NEAT1 gene. First location is, as expected, located in 3' region; however, the second location is present in 5' region. The latter location can only be explained by looking in additional transcripts of NEAT1 gene, which indeed starts in the same region as the second NEAT1 read peak ( Figure 6).

Gliovis expression analysis of m6A modified lncRNAs
After setting the list of lncRNA, in which m6A modifications were identified by both MeRIP-seq and dRNA-seq methods, it was decided to search open-access databases to evaluate if identified  molecules were already found to be associated with pathology. Since high throughput epitranscriptome research is still limited by available methods we were not able to find any information in databases on lncRNAs m6A modifications in glioma specimens. Thus, we decided to check whether the expression changes of highly m6A modified lncRNAs differ between glioma samples; for this purpose we used the analysis at Gliovis. The association between the expression of identified lncRNAs and glioma malignancy, as well as patients' survival, was examined using TCGA and CCGA datasets stored in the Gliovis database [32]. CYTOR, NEAT1, LOXL1-AS1, and ZNF295-AS1 were significantly associated with glioma grade and survival. The increased expression of these lncRNAs is found in high-grade gliomas, and is associated with poor survival prognosis. High expression of ILF3-AS1 was associated with low tumour grade and longer survival (Table 4). Significantly decreased levels of LINC00205, FGD5-AS1, and SNHG7 were characteristic for high-grade gliomas as compared to lower-grade tumours, while SNHG3 showed opposite associations. Gene expression information of 11 screened lncRNAs were not available in the Gliovis database (Table 4).

Discussion
Our results support potential edge for dRNA-seq over MeRIP-seq when analysing specific m6A modifications in lncRNAs. This is of particular importance as lncRNAs and ncRNAs contain more methylation sites than mRNAs. We provide information about 220 precise m6A modifications or 33 m6A peaks in 24 lncRNA molecules of U87-MG cell line, and m6A methylation frequency in those lncRNAs, listing most and least m6A modified lncRNAs. dRNA-seq uses first-strand, unamplified cDNA for sequencing, whereas MeRIP-seq uses PCRamplified cDNA. This results in a high coverage and detection rate of lncRNAs using MeRIP-seq approach. Although the difference in mapped read quantity between these methods may not the best parameter to compare their overall performance. MeRIP-seq reads are ~20 times shorter than dRNA-seq; therefore, the chance of detecting multiple molecules is higher than sequencing a long, continuous molecule, as done in dRNA-seq. However, these long, continuous molecules are in their native form and better represents their biological state by preserving accurate location of RNA modification and exon-exon junctions. In addition, dRNA-seq provides precise m6A location at a single nucleotide resolution. This enables an accurate analysis of the adenosine methylation status in a RRACH-rich transcriptomic regions, whereas the resolution given by the MeRIP-seq is not able to distinguish a specific adenosine methylation status if there are multiple RRACH motifs close to each other. Furthermore, RNA fragmentation step required in MeRIP-seq could potentially result in the underrepresentation of m6As close to the 3' end of RNA. Since NGS-seq provides short reads and TTS shows similarity among transcripts [33,34], the RNA fragments from 3' end mostly composed of polyA and fished-out by MeRIP assay are mostly unannotated and stay uncovered after data processing. In contrast, dRNA-seq starts to sequence RNA molecules from their 3' end and is able to reveal m6A methylation status at 3' end of transcripts, due to its ability to sequence long, continuous RNA fragments.
Low expression levels of lncRNAs coupled with RNA brakes during library preparation, and moderate erroneous of dRNA-seq itself could lead to an underrepresentation of 5' regions of lncRNAs, as illustrated by the MALAT1 example. However, it is not a critical factor for highly expressed or short RNAs up to ~1000 bp. One of the possibilities to overcome low coverage is an enrichment for specific lncRNA molecules during library preparation, modifying the reverse transcriptase adapter (RTA) by incorporating a lncRNA-specific anti-sense sequence. Sequencing of enriched lncRNA molecules should allow to obtain information of m6A status in full length of the transcript, and in high coverage.
The need of single nucleotide resolution is further improved by the latest's studies. Z.-X. Li et al., functionally verified the importance of specific lncRNA m6A modification for nasopharyngeal carcinoma growth and metastasis [35]. This finding could have never been achieved with MeRIP-seq and regular Illumina RNA-seq. Therefore, despite higher base-calling error rate and previously mentioned limitations, dRNA-seq still edges MeRIP-seq for its suitability to study epi-transcriptomic changes in a specific part of the non-coding transcriptome. The advantage of dRNA-seq is particularly noticeable, when RNA modifications are explored in a set of specific transcripts. The long reads outputted by dRNAseq enable the distinguishment of two very similar transcripts, when there is a high percentage of exon overlap between them. Current study revealed that despite MeRIP-seq identified ~14 times more lncRNA transcripts than dRNA-seq, it only revealed 2.8 times more m6A modified lncRNAs as compared to dRNA-seq. Such a data indicates dRNA-seq advantage for higher resolution of identifying methylated adenines in RNAs. Such a feature could be explained by dRNA-seq ability to identify single m6A at single RNA transcript, while MeRIP m6A peak from a single pulled-out fragment would be insignificant if at all. Antibody-based approaches may underestimate the number of m6A sites [36]. This is because MeRIP consist of many stages and at some stages m6A mark could be lost due to ruthless thermal conditions; moreover, at each stage portion of RNA material is lost [37,38].
Moreover, dRNA-seq provides possibility to detect other chemical modifications in transcriptome like pseudo-uridine, m5C, inosine, etc. These modifications could be mapped at a single nucleotide resolution in all expressed transcripts and provide an overall profile of the epitranscriptome. Even though the lack of bioinformatic tools for other epi-transcriptomic RNA modifications is currently a limiting step, dRNAseq data could be reanalysed with novel tools upon development. Specifically in our context of m6A identification in lncRNAs the lack of annotated ncRNAs is an issue, especially if MeRIP-seq is used, and secondly, unlike coding genes, lncRNA genes often overlap [39]. Overlapping could lead to discovery of false positives; this can be easily overcome by manual screening of the data in the MeRIP-seq case. Especially if the list of modified lncRNAs is small.
We identified 24 overlapping lncRNA between MeRIP-seq and dRNA-seq. We suggest that identified molecules were modified in similar regions, even though dRNA-seq provide a more precise location. Furthermore, we hypothesize that this list predominantly consists of molecules with modifications in 3' region for previously explained, 5' region underrepresentation, reasons. We successfully mapped 14 m6A modifications at a single nucleotide resolution within MALAT1 3' region which provided more information about MALAT1 m6A status than MeRIP-seq from the whole gene. Additionally, we were able to calculate the frequency of m6A modifications in lncRNAs, which provides a better understanding of the m6A status, than total number of m6A modifications per molecule. Based on mRNA research done by B. Liu et al., where authors revealed that m6A modification is affecting secondary structures of mRNAs [40,41], we theorize that m6A frequency could affect the secondary structure of lncRNAs, leading to its altered function. We believe that higher m6A frequency in lncRNAs can lead to its less flexible folding and weaken function; however, this must be proven by functional experiments.
Gliovis dataset analysis revealed that the expression levels of more than half lncRNAs that were detected as m6A modified using both methods were significantly associated with glioma malignancy grade or patient survival prognosis or both parameters. Taken all together it suggests that glioma pathogenesis and progression could be better explained by looking at combinatory analysis of transcriptomics and epi-transcriptomics rather than just transcriptomics alone. Future perspectives for m6A modification frequency could be further explored, and it may provide a better indication of cancer malignancy than global m6A level or expression levels of m6A machinery. Specific lncRNA targets and precise locations of m6A modifications detected in current research could be useful for future studies to validate the suitability of identified molecules for glioma screening and potential novel therapeutics, especially as CRISPR RNA editing already enables location-specific m6A installation and removal. This approach enables functional studies for m6A effects, and later could be repaired by genome editing using CRISPR base editing or CRISPR prime editing for removing undesired RRACH motifs.
Current studies have shown that the abnormal expression of m6A-related genes is closely associated with the tumorigenesis and progression of glioma [42,43]. However, most of these studies only focus on RNA expression of m6A 'writers' (methyltransferases), 'readers' (signal transducers), and 'erasers' (demethylases), but not to their specific targets. LncRNA molecules in most cases contain more than one adenine that can be modified. Very little is known about the functions of specific lncRNAs, their secondary structure and possible interactions with DNA, mRNA, miRNA, or proteins. Even less is known about how m6A modifications occurring at a specific location on a lncRNA molecule can influence the functions of that lncRNA.
In this study, we attempted to determine which lncRNA molecules potentially undergo changes of m6A modifications in gliomas. The list of 24 lncRNA molecules that undergo m6A modifications in the glioblastoma cell line allows us to assume that these modifications are possibly important for the growth and development of gliomas. We believe that higher m6A frequency in lncRNAs can lead to its less flexible folding and weaken function; however, this must be proven by functional experiments. In summary, we provide a high accuracy list of m6A modification locations in 24 lncRNA molecules of U87-MG cells.
Extensive studies in human glioblastoma samples are necessary to confirm the importance of identified lncRNAs m6A modifications in the processes of gliomagenesis, response to treatment, or prognosis. Since the amounts of tissue material are very limited when working with human or animal specimens, and the large amounts of RNA (accordingly, tissue) are required for the m6A detection assays, especially MeRIP-seq, it would be very difficult, if at all possible, to perform transcriptome-wide m6A methylation analysis applying both MeRIP-seq and dRNA-seq assays on the tissue from the same individual. After examining the advantages and disadvantages of m6A detection methods applying U87 cell lines, we are able to conclude which method would be more appropriate for m6A analysis in tissue studies.
Our data suggests that dRNA-seq is more suitable for studying chemical modifications of specific lncRNAs, rather than MeRIP-seq when single nucleotide resolution and m6A frequency are required or when the region of interest is in 3' region. On the other hand, MeRIP-seq is advantageous when screening of novel or lowly expressed molecules carrying m6A modification is intended to be done. We conclude that MeRIP-seq is more suitable for an initial m6A studies, due to its higher lncRNA coverage, whereas dRNA-seq is most useful when a specific list of m6A modified lncRNA molecules are already known, and a more in-depth analysis of m6A quantity and precise location is of interest.